Geospatial Analysis

Task 2.2

Importing Libraries

A list of packages and libraries is imported:

  1. Pandas - Open source library providing high-performance, easy-to-use data structures and data analysis tools.
  2. GeoPandas - A powerful package for spatial manipulation.
  3. Plotly - The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.

Reading shape files

Spatial data file of urban population and total population are readed with geopandas using ".read_file" function.

Task 2.2 Solutions

2.2.1 Urban population per capita for only the countries having population greater than 290000000 in the year 2010.
Plotting Choropleth Map

For the year 2010, countries having population greater than "290000000" are China, India and United states. The urban popluation per capita of these countries are "0.49266", "0.3093" and "0.80772" respectively. We can say that in 2010, around 31% of the India population projected to lived in urban areas, half of the population of China lived in urban areas whereas around 80% of the people of United States lived in Urban Areas which tends to be a well developed country than India and China.

2.2.2 Urban population per capita for only the countries having population lesser than 69000000 in the year 2010.
Plotting Choropleth Map

As we can see from the above choropleth map there are many countries having population less than "69000000" in the year 2010. The country having the highest urban per capita in 2010 was Qatar with 98.5%. As Qatar is not a big country but with 98.5% of urban population it can be called as developed and advanced country.

Many countries comes in the range of 80% to 95% of urban population per capita in 2010 such as Uruguay (94.4%), Iceland (93.5%), Argentina (90.8%), Australia (85.2%), New Zealand (86.16)%, Chile (87.07%), Sweden (85.05%), Finland (83.77%), Greenland (84.38%), Canada (80.9%). These countries can be called as less developed countries than Qatar in that year.

The country having the least urban population in 2010 was Burundi. Burundi is an small east-african country with only 10.6% of urban population in 2010 which means around 90% of the people lives in rural areas in Burundi.

2.2.3 Urban population per capita for only the countries having urban population between 110146163 and 223096279 in the year 2010.
Plotting Choropleth Map

For the year 2010, countries having urban population between "110146163" and "223096279" are Brazil, Indonesia and Japan. The urban popluation per capita of these countries are "0.84335", "0.49914" and "0.90812" respectively. We can say that in 2010, around 84% of the Brazil population projected to lived in urban areas, half of the population of Indonesia lived in urban areas whereas around 90% of the people of Japan lived in Urban Areas which tends to be a well developed country.

2.2.4 Percentage change in the urban population per capita from 1990 to 2010, for the country having the highest population in 2010.

The ".max" fucntion of pandas is used to find the country which has highest population in 2010.

The percentage change in the urban population per capita from 1990 to 2010, for the country having the highest population in 2010 is saved in the dataframe with the column named as "Percentage Change".

Plotting Choropleth Map

The country having the highest population in 2010 is "China". The urban population of China has grown rapidly from 26.4% in 1990 to 49.2% in 2010 with a percentage change of "86.16%". This shows that China has developed well in the 20 years from 1990 to 2010.

Using pandas ".get_loc()" function to find the index of the columns.

Pandas ".get_loc()" function return integer location, slice or boolean mask for requested label. The function works with both sorted as well as unsorted Indexes.

2.2.5 Mean per capita world urban population (from 1990 to 2010).

The “iloc” function in pandas is used to select rows and columns by number, in the order that they appear in the data frame.

The mean per capita world urban population (from 1990 to 2010) of all the countries is saved in the dataframe with the column named as "mean_per_cap".

Plotting Choropleth Map

The mean per capita from year 1990 to 2010 for every country is shown in the above choropleth map.

2.2.6 Correlation plot between mean world population and mean per capita world urban population (from 1990 to 2010).

The ".iloc" of pandas is used to select specific columns of the dataframe.

The mean world population of the year between 1990 and 2010 is saved in the dataframe with the column named as "Mean_WP".

The ".concat" function is used to combine columns of two different dataframes.

Scatter Plot

Scatterplots are a fundamental graph type—much less complicated than histograms and boxplots.

From the above plot we can conclude that the relationship between "mean world population" and "mean per capita world urban population" are weak as every point in the scatter plot is near 0.
To find the exact correlation in a single number we can use "correlation coefficient" with "correlation matrix" that describes the extent of the linear relationship between two variables.

Coefficient of Correlation and Correlation Matrix using Heat Map

A correlation matrix is a simple way to summarize the correlations between all variables in a dataset. As in our dataset, we have the following information of "Mean world population" and "Mean per capita urban population". It would be very difficult to understand the relationship between each variable by simple staring at the raw data. Fortunately, a correlation matrix can help us quickly understand the correlations between the pair of variables.

We want to understand the relationship between "Mean world population" and "Mean per capita urban population". One way to quantify this relationship is to use the Pearson correlation coefficient, which is a measure of the linear association between two variables. It has a value between -1 and 1 where:

"-1" indicates a perfectly negative linear correlation between two variables.
"0" indicates no linear correlation between two variables.
"1" indicates a perfectly positive linear correlation between two variables.
The further away the correlation coefficient is from zero, the stronger the relationship between the two variables.

Pandas DataFrame’s ".corr()" method is used to compute the correlation and seaborn’s "heatmap()" method is used to plot the matrix.

Each cells in the above plot shows the correlation between the two variables. The black cells shows that the correlation between "Mean world population" and "Mean per capita" is -0.082, which indicates that they're weakly negatively correlated. Also notice that the correlation coefficients along the diagonal of the plot are equal to 1 because each variable is perfectly correlated with itself.